{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <font color='green'> Word2Vec<font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 1. Dataset Description<font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dataset used in this demo is Rotten Tomatoes movie review dataset. It is a corpus of movie reviews used majorly for sentiment analysis. It can be downloaded from [here](https://drive.google.com/file/d/1w1TsJB-gmIkZ28d1j7sf1sqcPmHXw352/view)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Import required libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Importing some necessary libraries\n",
    "import os\n",
    "import time\n",
    "import pandas as pd\n",
    "import nltk\n",
    "import numpy as np\n",
    "import gensim\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from nltk.tokenize import word_tokenize\n",
    "from nltk.corpus import stopwords"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 2. Data Preprocessing<font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Loading RT review dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# download the dataset from below link (wget cannot be used bec it is google drive link)\n",
    "# https://drive.google.com/file/d/1w1TsJB-gmIkZ28d1j7sf1sqcPmHXw352/view"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(480000, 2)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Load RT review dataset\n",
    "data_file = \"datasets/rt_reviews.csv\"\n",
    "df = pd.read_csv(data_file, encoding=\"ISO-8859-1\")\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Dataset summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset contains 480000 reviews\n",
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 480000 entries, 0 to 479999\n",
      "Data columns (total 2 columns):\n",
      " #   Column     Non-Null Count   Dtype \n",
      "---  ------     --------------   ----- \n",
      " 0   Freshness  480000 non-null  object\n",
      " 1   Review     480000 non-null  object\n",
      "dtypes: object(2)\n",
      "memory usage: 7.3+ MB\n"
     ]
    }
   ],
   "source": [
    "# Dataset Analysis\n",
    "print(\"Dataset contains {} reviews\".format(df.shape[0]))\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check top rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Freshness</th>\n",
       "      <th>Review</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>fresh</td>\n",
       "      <td>Manakamana doesn't answer any questions, yet ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>fresh</td>\n",
       "      <td>Wilfully offensive and powered by a chest-thu...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>rotten</td>\n",
       "      <td>It would be difficult to imagine material mor...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>rotten</td>\n",
       "      <td>Despite the gusto its star brings to the role...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>rotten</td>\n",
       "      <td>If there was a good idea at the core of this ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Freshness                                             Review\n",
       "0     fresh   Manakamana doesn't answer any questions, yet ...\n",
       "1     fresh   Wilfully offensive and powered by a chest-thu...\n",
       "2    rotten   It would be difficult to imagine material mor...\n",
       "3    rotten   Despite the gusto its star brings to the role...\n",
       "4    rotten   If there was a good idea at the core of this ..."
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Drop the rows with na values and shuffle dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(480000, 2)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Data Cleaning\n",
    "df = df.dropna().sample(frac=1, random_state=42)\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Covert the reviews to lower case"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Freshness</th>\n",
       "      <th>Review</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>312423</th>\n",
       "      <td>fresh</td>\n",
       "      <td>guardians of the galaxy is first-class, grade...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6849</th>\n",
       "      <td>rotten</td>\n",
       "      <td>for a while, life aquatic gets mileage out of...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>361455</th>\n",
       "      <td>rotten</td>\n",
       "      <td>director ken scott stresses the movie's dude-...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5537</th>\n",
       "      <td>fresh</td>\n",
       "      <td>more a snapshot then a full blown insight int...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>166017</th>\n",
       "      <td>fresh</td>\n",
       "      <td>the immigrant experience takes on a blacker-t...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259178</th>\n",
       "      <td>fresh</td>\n",
       "      <td>at its best, with soviets, americans and raim...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>365838</th>\n",
       "      <td>rotten</td>\n",
       "      <td>just friends is a dumb teen comedy.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131932</th>\n",
       "      <td>rotten</td>\n",
       "      <td>fairly successful at faking some pretty cool ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146867</th>\n",
       "      <td>fresh</td>\n",
       "      <td>the pacing misses a few beats and the satire ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121958</th>\n",
       "      <td>rotten</td>\n",
       "      <td>how do our flighty young heroines fight back?...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>480000 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       Freshness                                             Review\n",
       "312423     fresh   guardians of the galaxy is first-class, grade...\n",
       "6849      rotten   for a while, life aquatic gets mileage out of...\n",
       "361455    rotten   director ken scott stresses the movie's dude-...\n",
       "5537       fresh   more a snapshot then a full blown insight int...\n",
       "166017     fresh   the immigrant experience takes on a blacker-t...\n",
       "...          ...                                                ...\n",
       "259178     fresh   at its best, with soviets, americans and raim...\n",
       "365838    rotten                just friends is a dumb teen comedy.\n",
       "131932    rotten   fairly successful at faking some pretty cool ...\n",
       "146867     fresh   the pacing misses a few beats and the satire ...\n",
       "121958    rotten   how do our flighty young heroines fight back?...\n",
       "\n",
       "[480000 rows x 2 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Review'] = df['Review'].str.lower()\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Remove Punctuation Marks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re, string\n",
    "def clean_review(review):\n",
    "    pattern = re.compile(r'[^a-zA-Z0-9 ]')\n",
    "    review = pattern.sub(' ', review)\n",
    "    return review"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Freshness</th>\n",
       "      <th>Review</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>312423</th>\n",
       "      <td>fresh</td>\n",
       "      <td>guardians of the galaxy is first class  grade...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6849</th>\n",
       "      <td>rotten</td>\n",
       "      <td>for a while  life aquatic gets mileage out of...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>361455</th>\n",
       "      <td>rotten</td>\n",
       "      <td>director ken scott stresses the movie s dude ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5537</th>\n",
       "      <td>fresh</td>\n",
       "      <td>more a snapshot then a full blown insight int...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>166017</th>\n",
       "      <td>fresh</td>\n",
       "      <td>the immigrant experience takes on a blacker t...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259178</th>\n",
       "      <td>fresh</td>\n",
       "      <td>at its best  with soviets  americans and raim...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>365838</th>\n",
       "      <td>rotten</td>\n",
       "      <td>just friends is a dumb teen comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131932</th>\n",
       "      <td>rotten</td>\n",
       "      <td>fairly successful at faking some pretty cool ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146867</th>\n",
       "      <td>fresh</td>\n",
       "      <td>the pacing misses a few beats and the satire ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121958</th>\n",
       "      <td>rotten</td>\n",
       "      <td>how do our flighty young heroines fight back ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>480000 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       Freshness                                             Review\n",
       "312423     fresh   guardians of the galaxy is first class  grade...\n",
       "6849      rotten   for a while  life aquatic gets mileage out of...\n",
       "361455    rotten   director ken scott stresses the movie s dude ...\n",
       "5537       fresh   more a snapshot then a full blown insight int...\n",
       "166017     fresh   the immigrant experience takes on a blacker t...\n",
       "...          ...                                                ...\n",
       "259178     fresh   at its best  with soviets  americans and raim...\n",
       "365838    rotten                just friends is a dumb teen comedy \n",
       "131932    rotten   fairly successful at faking some pretty cool ...\n",
       "146867     fresh   the pacing misses a few beats and the satire ...\n",
       "121958    rotten   how do our flighty young heroines fight back ...\n",
       "\n",
       "[480000 rows x 2 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import string\n",
    "df['Review'] = df['Review'].apply(clean_review)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Tokenize and Remove stopwords"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Freshness</th>\n",
       "      <th>Review</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>312423</th>\n",
       "      <td>fresh</td>\n",
       "      <td>[guardians, galaxy, first, class, grade, space...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6849</th>\n",
       "      <td>rotten</td>\n",
       "      <td>[life, aquatic, gets, mileage, quirkiness, pro...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>361455</th>\n",
       "      <td>rotten</td>\n",
       "      <td>[director, ken, scott, stresses, movie, dude, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5537</th>\n",
       "      <td>fresh</td>\n",
       "      <td>[snapshot, full, blown, insight, either, vogue...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>166017</th>\n",
       "      <td>fresh</td>\n",
       "      <td>[immigrant, experience, takes, blacker, black,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259178</th>\n",
       "      <td>fresh</td>\n",
       "      <td>[best, soviets, americans, raimus, cross, purp...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>365838</th>\n",
       "      <td>rotten</td>\n",
       "      <td>[friends, dumb, teen, comedy]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131932</th>\n",
       "      <td>rotten</td>\n",
       "      <td>[fairly, successful, faking, pretty, cool, stu...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146867</th>\n",
       "      <td>fresh</td>\n",
       "      <td>[pacing, misses, beats, satire, never, pops, d...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121958</th>\n",
       "      <td>rotten</td>\n",
       "      <td>[flighty, young, heroines, fight, back, shoppi...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>480000 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       Freshness                                             Review\n",
       "312423     fresh  [guardians, galaxy, first, class, grade, space...\n",
       "6849      rotten  [life, aquatic, gets, mileage, quirkiness, pro...\n",
       "361455    rotten  [director, ken, scott, stresses, movie, dude, ...\n",
       "5537       fresh  [snapshot, full, blown, insight, either, vogue...\n",
       "166017     fresh  [immigrant, experience, takes, blacker, black,...\n",
       "...          ...                                                ...\n",
       "259178     fresh  [best, soviets, americans, raimus, cross, purp...\n",
       "365838    rotten                      [friends, dumb, teen, comedy]\n",
       "131932    rotten  [fairly, successful, faking, pretty, cool, stu...\n",
       "146867     fresh  [pacing, misses, beats, satire, never, pops, d...\n",
       "121958    rotten  [flighty, young, heroines, fight, back, shoppi...\n",
       "\n",
       "[480000 rows x 2 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Tokenize and Remove Stop words\n",
    "stop = stopwords.words('english')\n",
    "df['Review'] = df['Review'].apply(lambda x: [item for item in word_tokenize(x) if item not in stop])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Label distribution summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEqCAYAAADZMh2mAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAdFUlEQVR4nO3deZRldXnu8e8jg4hCmFrEBmwkoCFGUfsqzlw1TEZRAgKitIZIEnGBN95EyCBciYkmEQ0xcoMXZDAKOF2JokgQg0FRmnmWDkLolqFtZBJiAN/8cX4lh7KGQ/c+daiq72ets2qfd0/v6VXUw977d/ZOVSFJUpeeMOoGJElzj+EiSeqc4SJJ6pzhIknqnOEiSeqc4SJJ6pzhIjVJvpXkd2d63bb+K5Jcv7rrT7C9ryVZ0qbfnuTfOtz2AUm+0dX2NDcZLppzktyU5LWj7mNMkqOSPJjk3vb6QZKPJ9libJmq+nZVPWvAbX16uuWqaveqOrmD3hclqSRr9237n6pqlzXdtuY2w0WaGadX1QbAJsCbgKcBF/cHTBfS43/XGjl/CTVvJNk4yVeSrEzykza95bjFtk3y/ST3JPlykk361t8pyXeS3JXk8iQ7P9YequrBqroa2BdYCby3bXvnJMv79vW+JCvakc71SV6TZDfgT4B9k9yX5PK27LeSfDDJBcD9wDMnOE2XdrR0d5Lrkrymb8ajjvTGHR2d337e1fb5kvGn2ZK8NMlFbdsXJXlp37xvJTk6yQXts3wjyWaP9d9Ns4/hovnkCcCngGcAWwMPAB8ft8yBwO8AWwAPAccCJFkIfBX4C3pHH/8b+EKSBavTSFU9DHwZeMX4eUmeBbwb+B/taGdX4Kaq+jrwl/SOgp5SVc/rW+1twMHABsDNE+zyxcC/A5sBRwJf7A/OKbyy/dyo7fO743rdhN6/y7HApsAxwFeTbNq32FuAdwBPBdal92+nOc5w0bxRVauq6gtVdX9V3Qt8EHjVuMVOraqrquqnwJ8Db06yFvBW4KyqOquqfl5V5wBLgT3WoKUf0Quq8R4GngjskGSdqrqpqv59mm2dVFVXV9VDVfXgBPPvAD7WjpxOB64HXrcGvY95HXBDVZ3a9v1Z4Drg9X3LfKqqflBVDwBnADt2sF89zhkumjeSrJ/kH5PcnOQeeqd8NmrhMeaWvumbgXXo/d/+M4B92imxu5LcBbyc3hHO6loI3Dm+WFXLgPcARwF3JDktydOn2dYt08xfUY++S+3NwHTbHMTT+eUjpZvpfbYxt/VN3w88pYP96nHOcNF88l7gWcCLq2pDHjnlk75ltuqb3hp4EPgxvT/ep1bVRn2vJ1fVh1ankXbR/fXAtyeaX1WfqaqX0wu1Aj48NmuSTU53e/OFSfo/59b0jpwAfgqs3zfvaY9huz9qPfbbGlgxzXqa4wwXzVXrJFmv77U2vesRD9C7OL0JvWsP4701yQ5J1gc+AHy+XR/5NPD6JLsmWattc+cJBgRMKcnaSX4N+Cy9P+LHTLDMs5K8OskTgf9sPf+8zb4dWLQaI8KeChyaZJ0k+wC/BpzV5l0G7NfmLQb27ltvZdv3MyfZ7lnA9kne0j7bvsAOwFceY3+aYwwXzVVn0fujPPY6CvgY8CR6RyIXAl+fYL1TgZPoncpZDzgUoKpuAfakN1prJb0jmT9i8P+G9k1yH3A3cCawCnhhVf1ogmWfCHyo9XkbvWA4os37XPu5KsklA+4b4HvAdm2bHwT2rqpVbd6fA9sCPwH+D/CZsZWq6v62/AXtdOBO/Rtt2/gtekeFq4A/Bn6rqn78GHrTHBQfFiZJ6ppHLpKkzhkukqTOGS6SpM4NLVySbJXkvCTXJLk6yWGtflS7rcVl7bVH3zpHJFnWbnexa199t1ZbluTwvvo2Sb7X6qcnWbfVn9jeL2vzFw3rc0qSftnQLuind0O+LarqkiQbABcDbwTeDNxXVX87bvkd6A3PfBG9L2b9C7B9m/0D4DeB5cBFwP5VdU2SM4AvVtVpSf4vcHlVHZfkXcBzq+r3k+wHvKmq9p2q380226wWLVrUyWeXpPni4osv/nFV/dJtkNaeaOEuVNWtwK1t+t4k1/Lob+2OtydwWlX9DPhhkmX0ggZgWVXdCJDkNGDPtr1X07tvEcDJ9IabHte2dVSrfx74eJLUFEm6aNEili5d+pg/pyTNZ0kmupfdzFxzaaelnk9vrD3Au5NckeTEJBu32kIefQuL5a02WX1T4K6qemhc/VHbavPvbsuP7+vgJEuTLF25cuWafUhJ0i8MPVySPAX4AvCeqrqH3pHFtvRuXncr8JFh9zCZqjq+qhZX1eIFC1br5raSpAkMNVySrEMvWP6pqr4IUFW3V9XDVfVz4JM8cuprBY++r9OWrTZZfRW9mw6uPa7+qG21+b/SlpckzYBhjhYLcAJwbVUd01fvv4vsm4Cr2vSZ9O5v9MQk29C7VcX36V3A366NDFsX2A84s10/OY9H7oO0hN7zMca2taRN7w18c6rrLZKkbg3tgj7wMnoPMLoyyWWt9ifA/kl2pHe31ZuA3wOoqqvb6K9r6D2k6ZB2w0CSvBs4G1gLOLE9yQ/gfcBpSf4CuJRemNF+ntoGBdxJL5AkSTPEe4s1ixcvLkeLSdJjk+Tiqlo8vu439CVJnTNcJEmdM1wkSZ0b5gV9DcGiw7866hbmlJs+9LpRtzBn+LvZrdn+u+mRiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXOGiySpc4aLJKlzhoskqXNDC5ckWyU5L8k1Sa5Oclirb5LknCQ3tJ8bt3qSHJtkWZIrkrygb1tL2vI3JFnSV39hkivbOscmyVT7kCTNjGEeuTwEvLeqdgB2Ag5JsgNwOHBuVW0HnNveA+wObNdeBwPHQS8ogCOBFwMvAo7sC4vjgHf2rbdbq0+2D0nSDBhauFTVrVV1SZu+F7gWWAjsCZzcFjsZeGOb3hM4pXouBDZKsgWwK3BOVd1ZVT8BzgF2a/M2rKoLq6qAU8Zta6J9SJJmwIxcc0myCHg+8D1g86q6tc26Ddi8TS8EbulbbXmrTVVfPkGdKfYxvq+DkyxNsnTlypWr8ckkSRMZergkeQrwBeA9VXVP/7x2xFHD3P9U+6iq46tqcVUtXrBgwTDbkKR5ZajhkmQdesHyT1X1xVa+vZ3Sov28o9VXAFv1rb5lq01V33KC+lT7kCTNgGGOFgtwAnBtVR3TN+tMYGzE1xLgy331A9uosZ2Au9uprbOBXZJs3C7k7wKc3ebdk2Sntq8Dx21ron1IkmbA2kPc9suAtwFXJrms1f4E+BBwRpKDgJuBN7d5ZwF7AMuA+4F3AFTVnUmOBi5qy32gqu5s0+8CTgKeBHytvZhiH5KkGTC0cKmqfwMyyezXTLB8AYdMsq0TgRMnqC8FnjNBfdVE+5AkzQy/oS9J6pzhIknqnOEiSeqc4SJJ6pzhIknqnOEiSeqc4SJJ6pzhIknqnOEiSeqc4SJJ6pzhIknqnOEiSeqc4SJJ6pzhIknqnOEiSeqc4SJJ6pzhIknqnOEiSeqc4SJJ6pzhIknqnOEiSerctOGS5MlJntCmt0/yhiTrDL81SdJsNciRy/nAekkWAt8A3gacNMymJEmz2yDhkqq6H9gL+ERV7QP8+nDbkiTNZgOFS5KXAAcAX221tYbXkiRpthskXN4DHAF8qaquTvJM4LyhdiVJmtXWnm6BqvpX4F+TrN/e3wgcOuzGJEmz1yCjxV6S5Brguvb+eUk+MfTOJEmz1iCnxT4G7AqsAqiqy4FXDrEnSdIsN9CXKKvqlnGlh4fQiyRpjpj2mgtwS5KXAtW+PHkYcO1w25IkzWaDHLn8PnAIsBBYAezY3kuSNKFBjlzuq6oDht6JJGnOGCRcrkpyO/Dt9vq3qrp7uG1JkmazaU+LVdWvAvsDVwKvAy5Pctl06yU5MckdSa7qqx2VZEWSy9prj755RyRZluT6JLv21XdrtWVJDu+rb5Pke61+epJ1W/2J7f2yNn/RYP8UkqSuDPI9ly2BlwGvAJ4PXA2cPsC2TwJ2m6D+0arasb3OavvYAdiP3j3LdgM+kWStJGsB/wDsDuwA7N+WBfhw29avAj8BDmr1g4CftPpH23KSpBk0yAX9/6B3C5ivVdVLqup1VfVX061UVecDdw7Yx57AaVX1s6r6IbAMeFF7LauqG6vqv4DTgD2TBHg18Pm2/snAG/u2dXKb/jzwmra8JGmGDBIuzwdOAd6S5LtJTkly0HQrTeHdSa5op802brWFQP93aZa32mT1TYG7quqhcfVHbavNv7st/0uSHJxkaZKlK1euXIOPJEnqN8g1l8vpHQl8Cvgm8Crg/au5v+OAbekNZ74V+MhqbqcTVXV8VS2uqsULFiwYZSuSNKdMO1osyVLgicB36I0We2VV3bw6O6uq2/u2+0ngK+3tCmCrvkW3bDUmqa8CNkqydjs66V9+bFvLk6wN/EpbXpI0QwYZirx7VXVyzijJFlV1a3v7JmBsJNmZwGeSHAM8HdgO+D4QYLsk29ALjf2At1RVJTkP2JvedZglwJf7trUE+G6b/82qqi76lyQNZpBweUKSE4CnV9XubbTWS6rqhKlWSvJZYGdgsyTLgSOBnZPsCBRwE/B7AO05MWcA1wAPAYdU1cNtO+8Gzqb3gLITq+rqtov3Aacl+QvgUmCsnxOAU5MsozegYL8BPqMkqUODhMtJ9K63/Gl7/wN6Q5GnDJeq2n+C8qTrVNUHgQ9OUD8LOGuC+o30RpONr/8nsM9UvUmShmuQ0WKbVdUZwM/hFyOwvCuyJGlSg4TLT5NsSu9UFkl2oje8V5KkCQ1yWuwP6V0k3zbJBcACehfKJUma0LThUlWXJHkV8Cx6o7eur6oHh96ZJGnWmjRckry6qr6ZZK9xs7ZPQlV9cci9SZJmqamOXF5F7xv5r59gXgGGiyRpQpOGS1Ud2SZ/d+w7J5IkDWKQ0WI/THJ8Eu8uLEkayCDh8mzgX4BD6AXNx5O8fLhtSZJms0Huinx/VZ1RVXvRu/3+hsC/Dr0zSdKsNciRC0leleQTwMXAesCbh9qVJGlWG+SW+zfRuzHkGcAfVdVPh92UJGl2G+Qb+s+tqnuG3okkac4Y5LTY05Kcm+QqgCTPTfJnQ+5LkjSLDRIunwSOAB4EqKor8BkpkqQpDBIu61fV98fVHhpGM5KkuWGQcPlxkm155Jb7ewO3Tr2KJGk+G+SC/iHA8cCzk6wAfggcMNSuJEmz2pThkmQt4F1V9dokTwaeUFX3zkxrkqTZaspwqaqHx2714vdbJEmDGuS02KVJzgQ+B/wiYHyeiyRpMoOEy3rAKuDVfTWf5yJJmtQgjzl+x0w0IkmaOwa6caUkSY+F4SJJ6tyk4ZLksPbzZTPXjiRpLpjqyGXsWsvfz0QjkqS5Y6oL+tcmuQF4epIr+uoBqqqeO9zWJEmz1aThUlX7J3kacDbwhplrSZI02033Df3bgOclWRfYvpWvr6oHh96ZJGnWGuQxx68CTgFuondKbKskS6rq/CH3JkmapQb5hv4xwC5VdT1Aku2BzwIvHGZjkqTZa5DvuawzFiwAVfUDYJ3htSRJmu0GOXJZmuT/AZ9u7w8Alg6vJUnSbDfIkcsfANcAh7bXNa02pSQnJrkjyVV9tU2SnJPkhvZz41ZPkmOTLEtyRZIX9K2zpC1/Q5IlffUXJrmyrXNskky1D0nSzJk2XKrqZ1V1TFXt1V4fraqfDbDtk4DdxtUOB86tqu2Ac9t7gN2B7drrYOA46AUFcCTwYuBFwJF9YXEc8M6+9XabZh+SpBkytHuLtdFkd44r7wmc3KZPBt7YVz+lei4ENkqyBbArcE5V3VlVPwHOAXZr8zasqgurquiNZnvjNPuQJM2Qmb5x5eZVdWubvg3YvE0vBG7pW255q01VXz5Bfap9/JIkBydZmmTpypUrV+PjSJImMrK7IrcjjhrlPqrq+KpaXFWLFyxYMMxWJGleWa1wSXLwau7v9nZKi/bzjlZfAWzVt9yWrTZVfcsJ6lPtQ5I0Q1b3yCWrud6ZwNiIryXAl/vqB7ZRYzsBd7dTW2cDuyTZuF3I3wU4u827J8lObZTYgeO2NdE+JEkzZJDvufySqvrH6ZZJ8llgZ2CzJMvpjfr6EHBGkoOAm4E3t8XPAvYAlgH30273X1V3JjkauKgt94GqGhsk8C56I9KeBHytvZhiH5KkGTLIvcW2pPdMl5fTu37xbeCwqlo+1XpVtf8ks14zwbIFHDLJdk4ETpygvhR4zgT1VRPtQ5I0cwY5LfYpeqeatgCeDvxzq0mSNKFBwmVBVX2qqh5qr5MAh1ZJkiY1SLisSvLWJGu111uBVcNuTJI0ew0SLr9D76L4bcCtwN60C+6SJE1k2gv6VXUzPuZYkvQYTBouSd4/xXpVVUcPoR9J0hww1ZHLTyeoPRk4CNgUMFwkSROaNFyq6iNj00k2AA6jd63lNOAjk60nSdKU11za81T+kN7TJ08GXtBufS9J0qSmuubyN8BewPHAb1TVfTPWlSRpVptqKPJ76X0j/8+AHyW5p73uTXLPzLQnSZqNprrmMrJnvUiSZjcDRJLUOcNFktQ5w0WS1DnDRZLUOcNFktQ5w0WS1DnDRZLUOcNFktQ5w0WS1DnDRZLUOcNFktQ5w0WS1DnDRZLUOcNFktQ5w0WS1DnDRZLUOcNFktQ5w0WS1DnDRZLUOcNFktQ5w0WS1DnDRZLUuZGES5KbklyZ5LIkS1ttkyTnJLmh/dy41ZPk2CTLklyR5AV921nSlr8hyZK++gvb9pe1dTPzn1KS5q9RHrn8z6rasaoWt/eHA+dW1XbAue09wO7Adu11MHAc9MIIOBJ4MfAi4MixQGrLvLNvvd2G/3EkSWMeT6fF9gRObtMnA2/sq59SPRcCGyXZAtgVOKeq7qyqnwDnALu1eRtW1YVVVcApfduSJM2AUYVLAd9IcnGSg1tt86q6tU3fBmzephcCt/Stu7zVpqovn6AuSZoha49ovy+vqhVJngqck+S6/plVVUlq2E20YDsYYOuttx727iRp3hjJkUtVrWg/7wC+RO+aye3tlBbt5x1t8RXAVn2rb9lqU9W3nKA+UR/HV9Xiqlq8YMGCNf1YkqRmxsMlyZOTbDA2DewCXAWcCYyN+FoCfLlNnwkc2EaN7QTc3U6fnQ3skmTjdiF/F+DsNu+eJDu1UWIH9m1LkjQDRnFabHPgS2108NrAZ6rq60kuAs5IchBwM/DmtvxZwB7AMuB+4B0AVXVnkqOBi9pyH6iqO9v0u4CTgCcBX2svSdIMmfFwqaobgedNUF8FvGaCegGHTLKtE4ETJ6gvBZ6zxs1KklbL42kosiRpjjBcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdM1wkSZ0zXCRJnTNcJEmdm7PhkmS3JNcnWZbk8FH3I0nzyZwMlyRrAf8A7A7sAOyfZIfRdiVJ88ecDBfgRcCyqrqxqv4LOA3Yc8Q9SdK8sfaoGxiShcAtfe+XAy8ev1CSg4GD29v7klw/A73NF5sBPx51E9PJh0fdgUbA381uPWOi4lwNl4FU1fHA8aPuYy5KsrSqFo+6D2k8fzdnxlw9LbYC2Krv/ZatJkmaAXM1XC4CtkuyTZJ1gf2AM0fckyTNG3PytFhVPZTk3cDZwFrAiVV19Yjbmm883ajHK383Z0CqatQ9SJLmmLl6WkySNEKGiySpc4aLJKlzhoskqXNzcrSYRifJQnrf2P3F71ZVnT+6jiRI8lJgEY/+vTxlZA3NA4aLOpPkw8C+wDXAw61cgOGikUlyKrAtcBmP/r00XIbIocjqTLs323Or6mej7kUak+RaYIfyj92M8pqLunQjsM6om5DGuQp42qibmG88LaYu3Q9cluRc4BdHL1V16Oha0nyV5J/pnf7aALgmyfd59O/lG0bV23xguKhLZ+I93PT48bejbmA+85qLOpXkScDWVeWzcfS4kOTJwANV9fMk2wPPBr5WVQ+OuLU5zWsu6kyS19MbkfP19n7HJB7JaNTOB9Zrw+S/AbwNOGmkHc0Dhou6dBS9R0zfBVBVlwHPHF07EtA7Q3M/sBfwiaraB3jOiHua8wwXdenBqrp7XO3nI+lEekSSvAQ4APhqq/m3b8i8oK8uXZ3kLcBaSbYDDgW+M+KepPcARwBfqqqrkzwTOG+0Lc19XtBXZ5KsD/wpsEsrnQ0c7Zcq9XiQZP12ekwzwHBRZ5LsU1Wfm64mzaR2SuwE4ClVtXWS5wG/V1XvGnFrc5rnHdWlIwasSTPpY8CuwCqAqroceOUoG5oPvOaiNZZkd2APYGGSY/tmbQg8NJqupEdU1S1J+ksPT7asumG4qAs/ApYCbwAu7qvfC/yvkXQkPeKWdsv9SrIOcBhw7Yh7mvO85qLOJPnjqvrrcbXDqurvRtWTlGQz4O+A1wKh90XKw6pq1Ugbm+MMF3UmySVV9YJxtUur6vmj6knzW5K1gFOq6oBR9zLfeFpMayzJ/sBbgG3G3e5lQ+DO0XQlQVU9nOQZSdatqv8adT/zieGiLnwHuBXYDPhIX/1e4IqRdCQ94kbggvY/Pj8dK1bVMaNrae5zKLLWWFXdXFXfqqqXANfRe37GBsDyqnK0mEaiPd4YegNNvkLv790GfS8NkUcu6kySfeg9Q+Nb9C6c/n2SP6qqz4+0Mc1XL0zydOA/gL8fdTPzjRf01ZkklwO/WVV3tPcLgH+pqueNtjPNR0kOBf4A2IbecPlfzAKqqrxj9xAZLupMkiur6jf63j8BuLy/Js20JMdV1R+Muo/5xnBRJ9L7+vMJwELgs628L3BFVb1vZI1JGgnDRZ1JchXwfuDlrfTtqvrSCFuSNCJe0FeXLgZuqao/HHUjkkbLIxd1Jsl1wK8CN/Po7xM8d2RNSRoJw0WdSfKMiepVdfNM9yJptAwXSVLn/Ia+JKlzhoskqXOGi7QGkjyc5LIkVyX55yQbTbP8jkn26Hv/hiSHd9TLSUn2nqC+c5KvtOm3Jzmqi/1JUzFcpDXzQFXtWFXPofd4gUOmWX5Heo+EBqCqzqyqDw2xP2kkDBepO9+ld4cCkrwoyXeTXJrkO0melWRd4APAvu1oZ992JPHxts5JSY5ty984dhSS5AlJPpHkuiTnJDlroiOUfkl2a8tfAuzVN+sB4L62zD7tiOvyJOd3/8+h+cwvUUodaE88fA29W+BA79EDr6iqh5K8FvjLqvrtJO8HFlfVu9t6bx+3qS3o3eHg2cCZwOfphcMiYAfgqfSe/37iFL2sB3wSeDWwDDh9bF5Vnd636PuBXatqxXSn86THyiMXac08KcllwG3A5sA5rf4rwOfaLXE+Cvz6gNv7/1X186q6pm0PemHzuVa/DThvmm08G/hhVd1Qve8afHqS5S4ATkryTmCtAfuTBmK4SGvmgaraEXgGvVu5j11zORo4r12LeT2w3oDb+1nfdLpqciJV9fvAnwFbARcn2XSY+9P8YrhIHaiq+4FDgfcmWZvekcuKNvvtfYvey2N/CuIFwG+3ay+bAztPs/x1wKIk27b3+0+0UJJtq+p7VfV+YCW9kJE6YbhIHamqS4Er6P0x/2vgr5JcyqOvbZ4H7DB2QX/ATX8BWA5cQ+8U1yXA3VP08Z/AwcBX2wX9OyZZ9G+SXNlO3X0HuHzAfqRpefsXaRZI8pSquq+duvo+8LJ2/UV6XHK0mDQ7fKWN6FoXONpg0eOdRy6SpM55zUWS1DnDRZLUOcNFktQ5w0WS1DnDRZLUuf8GSfT9OATsZswAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "rating_categories = df[\"Freshness\"].value_counts()\n",
    "ax = rating_categories.plot(kind='bar', title='Label Distribution').\\\n",
    "     set(xlabel=\"Rating Id's\", ylabel=\"No. of reviewes\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Encode the labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "mapping = {'fresh': 1, 'rotten': 0}\n",
    "df['Freshness'] = df.replace({'Freshness': mapping})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Split train-test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_train, df_test = train_test_split(df, stratify=df[\"Freshness\"], test_size=0.1, random_state = 42)\n",
    "df_train = df_train.copy(deep=True)\n",
    "df_test = df_test.copy(deep=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check the shape of all the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(432000, 2)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(48000, 2)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set number of threads"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'8'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "os.environ[\"VE_OMP_NUM_THREADS\"] = '8'\n",
    "os.environ[\"VE_OMP_NUM_THREADS\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 3. Logistic Regression using Gensim Word2Vec Embeddings<font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Import Gensim Word2Vec"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gensim.models import Word2Vec as Gensim_Word2Vec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generating word2vec embeddings for training vocabulary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "time_start = time.time()\n",
    "\n",
    "gensim_embeddings = Gensim_Word2Vec(df_train[\"Review\"].to_list(), size=512, min_count=2, sg=1, iter=100)\n",
    "\n",
    "time_stop = time.time()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check time taken by gensim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2372.7216968536377"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gensim_elapsed_time = time_stop - time_start\n",
    "gensim_elapsed_time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Store the vocab for further use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "gensim_vocab = gensim_embeddings.wv.vocab"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generating Train and Test Data Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "no_embedding = np.zeros(gensim_embeddings.vector_size)\n",
    "\n",
    "def document_vector_gensim(doc):\n",
    "    \"\"\"Create document vectors by averaging word vectors. Remove out-of-vocabulary words.\"\"\"\n",
    "    \n",
    "    vocab_doc = [word for word in doc if word in gensim_vocab]\n",
    "    \n",
    "    if len(vocab_doc) != 0:\n",
    "        return list(np.mean(gensim_embeddings.wv[vocab_doc], axis=0))\n",
    "    else:\n",
    "        return list(no_embedding)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_train[\"Gensim_Embedding\"] = df_train[\"Review\"].apply(document_vector_gensim)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_test[\"Gensim_Embedding\"] = df_test[\"Review\"].apply(document_vector_gensim)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check the shape of all the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(432000, 3)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(48000, 3)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Classification using gensim word2vec embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LogisticRegression(max_iter=10000)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Training\n",
    "model = LogisticRegression(max_iter=10000)\n",
    "model.fit(df_train[\"Gensim_Embedding\"].to_list(), df_train[\"Freshness\"].to_list())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7754814814814814"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Train Score\n",
    "gensim_train_score = model.score(df_train[\"Gensim_Embedding\"].to_list(), df_train[\"Freshness\"].to_list())\n",
    "gensim_train_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7710416666666666"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Test Score\n",
    "gensim_test_score = model.score(df_test[\"Gensim_Embedding\"].to_list(), df_test[\"Freshness\"].to_list())\n",
    "gensim_test_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 4. Logistic Regression using Frovedis Word2Vector Embeddings<font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Initializing Frovedis Server"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "from frovedis.exrpc.server import FrovedisServer\n",
    "\n",
    "FrovedisServer.initialize(\"mpirun -np 1 \" + os.environ[\"FROVEDIS_SERVER\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generating word2vec embeddings for data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "from frovedis.mllib import Word2Vec as Frovedis_Word2Vec\n",
    "\n",
    "time_start = time.time()\n",
    "frovedis_w2v = Frovedis_Word2Vec(df_train[\"Review\"].to_list(), hiddenSize=512, minCount=2, n_iter=100)\n",
    "time_stop = time.time()\n",
    "#frovedis_w2v.build_vocab_and_dump(textfile, encode, vocab, count)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check time taken by Frovedis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "104.39237999916077"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frovedis_elapsed_time = time_stop - time_start\n",
    "frovedis_elapsed_time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Save w2v embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = \"./out/rt_model.txt\"\n",
    "frovedis_w2v.save(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check the embedings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['</s>',\n",
       " 'film',\n",
       " 'movie',\n",
       " 'one',\n",
       " 'like',\n",
       " 'story',\n",
       " 'much',\n",
       " 'even',\n",
       " 'good',\n",
       " 'time']"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(frovedis_w2v.wv.keys())[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([-6.16584904e-02, -1.79333016e-02,  1.27072315e-04, -1.36971520e-02,\n",
       "        8.89550075e-02,  2.20617298e-02, -1.20351315e-01, -6.16060570e-02,\n",
       "        8.30795616e-02, -1.04777984e-01,  4.18959446e-02, -8.43611360e-02,\n",
       "        8.53607580e-02, -9.96442586e-02,  3.82825769e-02,  1.63230449e-01,\n",
       "       -7.22117871e-02,  2.28723362e-01,  1.54013231e-01,  3.30435447e-02,\n",
       "        7.09022060e-02,  1.23819327e-02, -4.12228554e-02, -2.60384288e-02,\n",
       "       -8.71949941e-02,  6.83157966e-02, -7.39157870e-02, -2.15297982e-01,\n",
       "        7.42161348e-02,  2.72165090e-02,  3.45337093e-02,  1.23983763e-01,\n",
       "       -2.20355149e-02,  7.72373155e-02,  8.90648067e-02,  4.04726863e-02,\n",
       "        1.93865485e-02, -2.01683477e-01, -3.57436650e-02, -1.89387411e-01,\n",
       "        2.60237083e-02,  1.15786055e-02,  3.84637192e-02, -6.14178665e-02,\n",
       "       -1.56340431e-02,  1.15998901e-01, -1.08585313e-01,  6.01873174e-02,\n",
       "        1.71462432e-01,  4.78881150e-02, -3.04641575e-01,  5.34929894e-02,\n",
       "       -8.51864964e-02, -1.40005974e-02,  1.72539398e-01, -4.48481813e-02,\n",
       "        4.59802561e-02,  4.18700241e-02,  1.46141658e-02, -1.24275021e-01,\n",
       "        1.35740802e-01,  9.05537046e-03, -1.37548462e-01,  1.61386654e-01,\n",
       "       -8.30460712e-02, -1.07483352e-02, -1.55692194e-02,  6.47299960e-02,\n",
       "        1.69707853e-02, -9.45982412e-02, -7.27198645e-02, -2.02708207e-02,\n",
       "       -3.70335206e-02,  8.95044208e-02, -4.78568599e-02,  1.26922894e-02,\n",
       "       -3.05962353e-03,  4.55656052e-02,  1.37210354e-01,  1.96463913e-01,\n",
       "        1.13073003e-03, -9.72449109e-02, -1.43410698e-01,  5.75277433e-02,\n",
       "       -1.26306519e-01,  2.07027178e-02, -5.30785844e-02, -4.18662466e-02,\n",
       "       -6.86196983e-02, -5.39912209e-02,  4.28612158e-03, -3.89083847e-02,\n",
       "       -1.33417577e-01,  1.05557129e-01,  3.41158807e-02,  5.19705415e-02,\n",
       "       -5.73388189e-02, -1.02259861e-02, -3.52113731e-02,  5.29568829e-02,\n",
       "        3.32845859e-02,  8.26780200e-02, -1.81353129e-02, -5.95269501e-02,\n",
       "       -5.56140095e-02,  6.13813922e-02, -3.96147408e-02,  4.76035941e-03,\n",
       "        5.23270518e-02, -1.28656710e-02,  1.70932010e-01,  4.71533053e-02,\n",
       "        5.76290973e-02,  1.07923970e-01, -1.25110358e-01,  7.58132292e-03,\n",
       "       -3.84449475e-02,  1.24136563e-02, -7.36235594e-03, -6.25555292e-02,\n",
       "        4.19284776e-02, -7.53675634e-03,  4.10244130e-02, -2.84012426e-02,\n",
       "        8.05918574e-02,  1.26778290e-01,  1.67448428e-02, -6.38017431e-02,\n",
       "        6.90733865e-02,  6.55935556e-02, -7.22220168e-02,  7.17702648e-03,\n",
       "       -5.75756840e-02,  1.44872531e-01, -4.09969650e-02, -1.39819756e-01,\n",
       "        6.17339052e-02,  6.31031767e-02,  3.05043869e-02, -1.76138487e-02,\n",
       "        9.86345783e-02, -3.54368128e-02, -1.27850637e-01,  1.11703528e-02,\n",
       "        2.04275940e-02, -1.23911037e-03, -1.05949700e-01, -1.56907290e-02,\n",
       "        1.83642767e-02, -5.60506284e-02,  1.60015114e-02, -7.35494569e-02,\n",
       "        5.53902835e-02, -9.84964520e-02, -1.24162622e-01, -1.41274080e-01,\n",
       "        1.48750290e-01,  7.64997825e-02,  6.96179792e-02,  1.22543052e-01,\n",
       "        1.64485142e-01,  5.93353547e-02, -1.66559219e-02, -1.55105665e-01,\n",
       "       -7.39306286e-02, -5.95648549e-02,  2.48333951e-03,  4.38599102e-02,\n",
       "       -7.90100545e-03, -1.02349810e-01,  1.77956343e-01,  1.29721656e-01,\n",
       "       -1.79272164e-02,  4.92246896e-02,  8.68332479e-03, -3.98193263e-02,\n",
       "        4.40875404e-02, -1.07451662e-01, -1.07959062e-02, -9.97408386e-03,\n",
       "       -3.08933910e-02, -5.03013581e-02,  3.47468257e-02, -1.88446939e-01,\n",
       "        2.41613691e-03,  1.64986979e-02, -8.88381153e-02,  6.41181991e-02,\n",
       "        2.67743524e-02,  4.84311068e-03,  1.37755442e-02, -8.87559447e-03,\n",
       "       -2.56146342e-02,  1.60067528e-02, -3.71894687e-02, -1.18049324e-01,\n",
       "        7.64527395e-02,  6.45107543e-03,  9.80340764e-02, -1.72293335e-02,\n",
       "        7.34979799e-03, -2.87999995e-02,  5.69299571e-02,  1.31911663e-02,\n",
       "       -2.68388372e-02, -1.21983727e-02,  3.76290716e-02,  8.04963484e-02,\n",
       "        3.37171070e-02, -3.44692543e-02, -3.07639465e-02,  2.02244207e-01,\n",
       "       -9.04050544e-02,  9.82602760e-02, -6.29972741e-02, -8.25669989e-02,\n",
       "        1.54859543e-01, -3.79355513e-02, -2.44671285e-01, -6.59500062e-02,\n",
       "       -3.81842516e-02,  2.78670248e-02,  2.72407755e-02,  5.44497930e-02,\n",
       "        1.20935719e-02, -2.16280445e-02, -3.36695313e-02,  2.24077422e-02,\n",
       "       -1.02219786e-02,  5.86878806e-02, -1.00620963e-01, -1.53494135e-01,\n",
       "        5.56738041e-02,  8.92579556e-02, -2.46097110e-02,  3.15220021e-02,\n",
       "       -5.78960553e-02, -1.42208800e-01, -1.23659834e-01, -2.45330632e-02,\n",
       "        3.34635451e-02,  7.23370761e-02,  3.43103856e-02,  4.01366428e-02,\n",
       "        4.16696146e-02,  5.70158416e-04,  1.32950684e-02,  5.25062382e-02,\n",
       "       -4.13529947e-02, -1.63544714e-03, -7.39354547e-03,  2.67487336e-02,\n",
       "       -5.21467477e-02,  1.09480675e-02,  1.82006648e-03, -8.42292607e-02,\n",
       "        8.88580233e-02, -1.39586195e-01, -5.16925156e-02, -7.92522728e-02,\n",
       "       -7.31076077e-02, -1.14203446e-01,  5.29275916e-04, -5.88358939e-02,\n",
       "       -8.78872871e-02,  1.28776282e-01,  2.99012084e-02, -4.10172716e-02,\n",
       "       -8.12877622e-03,  4.02489640e-02,  1.16633281e-01, -1.81082219e-01,\n",
       "        2.10428070e-02, -3.01319398e-02, -5.58033548e-02,  1.70868635e-01,\n",
       "       -2.94291414e-02, -1.67854950e-01,  4.34336299e-03,  1.05958708e-01,\n",
       "       -7.25993887e-02,  1.86445609e-01, -2.52844281e-02, -2.53065139e-01,\n",
       "       -2.29531273e-01,  2.82539725e-02, -5.49742728e-02, -3.03858388e-02,\n",
       "        7.73497149e-02, -3.80717441e-02, -2.61908630e-03, -5.70596494e-02,\n",
       "        1.21240407e-01, -2.08577830e-02, -7.01145753e-02, -4.22883183e-02,\n",
       "        4.00162786e-02, -4.77118231e-03,  1.05859756e-01, -3.55860554e-02,\n",
       "       -3.38952169e-02,  1.02417931e-01, -3.47055458e-02,  3.40612233e-02,\n",
       "       -3.73234749e-02, -1.66274719e-02,  1.55174702e-01,  5.00193201e-02,\n",
       "       -1.35038912e-01, -3.23892646e-02, -1.14258174e-02,  1.90194935e-01,\n",
       "       -3.42866480e-02, -3.44639309e-02,  9.73795429e-02,  2.55414844e-02,\n",
       "       -2.05362123e-03, -1.47361949e-01,  6.54455507e-03,  4.25993763e-02,\n",
       "       -1.29265994e-01, -2.03740597e-02, -9.12890434e-02,  4.51542512e-02,\n",
       "       -3.56785059e-02,  6.64848313e-02, -1.90493017e-02,  5.27040698e-02,\n",
       "       -4.83143739e-02, -1.50700331e-01,  6.76304623e-02, -4.43223231e-02,\n",
       "        2.38826871e-02,  7.86444247e-02, -1.02804266e-01, -1.14792570e-01,\n",
       "        9.85343456e-02,  1.20177343e-01, -2.10503154e-02, -5.81300259e-02,\n",
       "        7.94779062e-02, -1.51903316e-01, -1.03971705e-01,  6.64740130e-02,\n",
       "        5.26762567e-02, -9.90386009e-02,  1.13330849e-01,  9.64727923e-02,\n",
       "        5.52049913e-02, -4.76821549e-02,  4.80024293e-02, -1.09573446e-01,\n",
       "        9.70219821e-02,  7.18079321e-03,  2.16440428e-02, -3.59413326e-02,\n",
       "       -2.08275151e-02,  7.66346529e-02, -1.65131800e-02, -1.38293197e-02,\n",
       "        1.82313755e-01, -4.99696359e-02, -1.71567485e-01,  1.33188581e-02,\n",
       "        1.08451135e-01,  4.20318358e-02, -9.87679884e-03,  7.81595409e-02,\n",
       "       -5.22660231e-03,  1.18017860e-01,  1.16769001e-01,  2.92215981e-02,\n",
       "       -1.30517200e-01, -8.40616040e-03,  6.55029491e-02,  1.11109978e-02,\n",
       "       -7.48216957e-02,  4.84004728e-02, -1.56835079e-01, -5.98214790e-02,\n",
       "       -6.04468398e-02,  4.81407866e-02, -5.14177792e-02,  4.15649898e-02,\n",
       "        1.89033151e-01,  1.78043428e-03,  2.52236035e-02, -6.53326884e-03,\n",
       "        1.24587297e-01,  1.22066475e-02, -1.36783198e-01,  1.58769172e-02,\n",
       "       -6.64191507e-03, -2.91072614e-02,  1.59247573e-02, -7.05502182e-02,\n",
       "        3.03792991e-02, -9.61736143e-02, -4.68570143e-02, -5.90699129e-02,\n",
       "       -4.94772531e-02, -1.13550119e-01, -1.14183992e-01, -1.38328746e-01,\n",
       "        5.65049052e-02, -1.07148029e-02,  5.91406934e-02,  2.34333593e-02,\n",
       "       -1.39310896e-01, -5.98822581e-03,  1.31476286e-03, -1.15824044e-01,\n",
       "       -7.50942202e-03, -9.42708030e-02, -1.09477332e-02,  1.99725106e-02,\n",
       "        4.60291393e-02, -3.91480476e-02,  3.05383056e-02,  3.66805755e-02,\n",
       "        1.32886410e-01, -1.19068930e-02, -6.59228042e-02, -1.67459488e-01,\n",
       "       -3.08771878e-02, -1.19126804e-01,  1.16445981e-02, -6.47251755e-02,\n",
       "       -8.41204971e-02, -8.24186429e-02,  1.97338499e-03,  5.85365929e-02,\n",
       "       -1.95428934e-02,  8.59812275e-03,  4.14544679e-02,  1.33399395e-02,\n",
       "        5.28942905e-02, -5.76888882e-02,  7.87207633e-02, -1.10540822e-01,\n",
       "       -1.09896483e-02, -8.60429406e-02,  3.73298936e-02,  1.00718588e-01,\n",
       "       -2.25778073e-02,  2.86825988e-02, -2.50722803e-02,  7.68712461e-02,\n",
       "       -1.71538860e-01, -4.32012416e-02,  8.14743116e-02, -3.01595442e-02,\n",
       "        1.02423087e-01,  9.24319476e-02,  9.82237756e-02, -1.34705566e-02,\n",
       "       -2.27073938e-01,  4.22904827e-02,  1.58807132e-02, -1.32885829e-01,\n",
       "       -2.48339906e-01, -2.06733346e-02,  5.04176468e-02,  2.04474106e-02,\n",
       "       -1.15269632e-03,  4.70689423e-02,  9.92381126e-02,  5.31620681e-02,\n",
       "       -7.53774047e-02,  2.07630862e-02, -5.18710725e-02,  8.38588253e-02,\n",
       "        2.11707219e-01,  3.23271826e-02, -1.22237869e-01,  6.72344025e-03,\n",
       "       -9.14396942e-02, -1.01456188e-01, -7.61324391e-02,  9.31713264e-03,\n",
       "       -1.28366277e-01, -7.21365064e-02, -5.48805669e-02, -8.73771217e-03,\n",
       "       -7.98283666e-02,  1.34429783e-01, -1.40385106e-02,  1.14075944e-01,\n",
       "        1.85310794e-03,  1.58973187e-02,  4.06154357e-02, -3.93737480e-02,\n",
       "        6.10746816e-03,  1.77066028e-02, -2.02950865e-01,  1.43046543e-01,\n",
       "       -9.13245752e-02,  1.91924651e-03,  5.88713307e-03, -1.10241935e-01,\n",
       "        8.84339288e-02,  4.74574743e-03, -6.44653291e-02, -1.52415305e-01,\n",
       "        4.25813422e-02,  1.79149017e-01, -3.52186263e-02,  1.45704865e-01,\n",
       "        2.49372330e-02,  1.24042714e-02, -1.73176266e-02, -8.54401663e-02])"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frovedis_w2v.wv['film']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embeddings are of shape (61775, 512)\n"
     ]
    }
   ],
   "source": [
    "print(\"Embeddings are of shape ({0}, {1})\".format(len(frovedis_w2v.wv.keys()),\\\n",
    "                                           len(frovedis_w2v.wv['film'])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generating Train and Test Data Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_train[\"Frovedis_Embedding\"] = list(frovedis_w2v.transform(df_train[\"Review\"].to_list(), func=np.mean))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_test[\"Frovedis_Embedding\"] = list(frovedis_w2v.transform(df_test[\"Review\"].to_list(), func=np.mean))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Classification using frovedis word2vec embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LogisticRegression(max_iter=10000)"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Training\n",
    "model = LogisticRegression(max_iter=10000)\n",
    "model.fit(df_train[\"Frovedis_Embedding\"].to_list(), df_train[\"Freshness\"].to_list())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7763217592592593"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Train Score\n",
    "frov_train_score = model.score(df_train[\"Frovedis_Embedding\"].to_list(), df_train[\"Freshness\"].to_list())\n",
    "frov_train_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.77225"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Test Score\n",
    "frov_test_score = model.score(df_test[\"Frovedis_Embedding\"].to_list(), df_test[\"Freshness\"].to_list())\n",
    "frov_test_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Shutting Down Frovedis Server"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "FrovedisServer.shut_down()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 5. Results Comparison<font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Frovedis train time : 104.392 sec\n",
      "Gensim train time : 2372.722 sec\n"
     ]
    }
   ],
   "source": [
    "# Training Time\n",
    "print (\"Frovedis train time : {:.3f} sec\".format(frovedis_elapsed_time))\n",
    "print (\"Gensim train time : {:.3f} sec\".format(gensim_elapsed_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LogisticRegression score using Frovedis word2vec embeddings : 0.77225\n",
      "LogisticRegression score using Gensim word2vec embeddings : 0.7710416666666666\n"
     ]
    }
   ],
   "source": [
    "# Score\n",
    "print('LogisticRegression score using Frovedis word2vec embeddings : '+ str(frov_test_score))\n",
    "print('LogisticRegression score using Gensim word2vec embeddings : '+ str(gensim_test_score))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
